Pipelining and bypassing in a VLIW processor

نویسندگان

Arthur Abnous

Nader Bagherzadeh

چکیده

2) Sparc2 was programmed in C , a high-level language, and MARS was programmed in assembly language, which made full use of the hardware features of the MARS processors like bit field extraction and manipulation, table access, and interprocessor communication support. The key contribution of this short note is the partitioning of the Goldberg-Tarjan network flow algorithm for pipelined execution on a message-passing multicomputer. The MARS multicomputer is the platform used in the implementation. Although 15 processors were available in MARS, the granularity of the algorithm needed to maintain maximal data locality permitted the use of only six processors. A larger number of processors would have required data duplication and more interprocessor communication. Even in the present implementation, copies of the vertex labels are maintained in tables in three PE memories. This forced the algorithm to be partitioned into two phases, thus reducing the efficiency. A serial implementation of the entire algorithm on a single PE of the MARS system was impossible. because of the limited size of the program memory within each PE. Although the exact CPU time of this implementation could not be measured, a six-processor version at 5 MHz yields the same order of performance as a Sparc2 workstation at 40 MHz. Many data partitioning-based implementations of the same algorithm exist IS], [7]. Ours is the first attempt at an algorithm-based partitioning approach. The pipelined implementation can be made even faster by using a hybrid pipelined-parallel approach that uses many PE's in parallel within the pipelined stages. Other methods of combining data partitioning and pipelining may improve the efficiency of either implementation. Further studies may investigate such combinations. Experiments with the push-relabel method for the maximum flow problem on a connection machine, " DI-Analysis of preflow push algorithms for maximum network flow, " SIAM J. [8] A. V. Kwzanov, " Determining the maximum flow in a network by the method of preflows, " Soviet Math. Abstract-This short note describes issues involved in the bypassing mechanism for a very long instruction word (VLIW) processor and its relation to the pipeline structure of the processor. We will first describe the pipeline structure of our processor and analyze its performance and compare it to typical RISC-style pipeline structures given the context of a processor with multiple functional units. Next we shall study the performance effects of various bypassing schemes in terms of their effectiveness in resolving pipeline data hazards and …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimization of SAD Algorithm on VLIW DSP

SAD (Sum of Absolute Difference) algorithm is heavily used in motion estimation which is computationally highly demanding process in motion picture encoding. To enhance the performance of motion picture encoding on a VLIW processor, an efficient implementation of SAD algorithm on the VLIW processor is essential. SAD algorithm is programmed as a nested loop with a conditional branch. In VLIW pro...

متن کامل

Architectural Design and Analysis of a VLIW Processor

Architectural design and analysis of VIPER, a VLIW processor designed to take advantage of instruction level parallelism, are presented. VIPER is designed to take advantage of the parallelizing capabilities of Percolation Scheduling. The approach taken in the design of VIPER addresses design issues involving implementation constraints, organizational techniques, and code generation strategies. ...

متن کامل

Software Pipelining and Superblock Scheduling: Compilation Techniques for VLIW Machines

© Copyright Hewlett-Packard Company 1992 Compilers for VLIW and superscalar processors have to expose instruction-level parallelism to effectively utilize the hardware. Software pipelining is a scheduling technique to overlap successive iterations of loops, while superblock scheduling extracts ILP from frequently executed traces. This paper describes an effort to employ both software pipelining...

متن کامل

Software pipelining for Jetpipeline architecture

High performance processors based on pipeline processing play an important role in scientific computation. We have proposed a hybrid pipeline architecture named Jetpipeline in our former work. The concept of Jetpipeline comes from the integration of superscalar, VLIW and vector architectures. Jetpipeline has multiple instruction pipelines, which execute multiple instructions like superscalar ar...

متن کامل

Time Optimal Software Pipelining of Loops with Control Flows for VLIW Processors

Software pipelining is widely used as a compiler optimization technique to achieve high performance in machines that exploit instruction-level parallelism such as superscalar or VLIW processors. However, surprisingly, there have been few theoretical results on the optimality of software pipelined loops with control flows. The problem of time optimal software pipelining of loops with control flo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1992

Pipelining and bypassing in a VLIW processor

نویسندگان

چکیده

منابع مشابه

Optimization of SAD Algorithm on VLIW DSP

Architectural Design and Analysis of a VLIW Processor

Software Pipelining and Superblock Scheduling: Compilation Techniques for VLIW Machines

Software pipelining for Jetpipeline architecture

Time Optimal Software Pipelining of Loops with Control Flows for VLIW Processors

عنوان ژورنال:

اشتراک گذاری